4

Introduction

this may cause significant errors in quantization. Before introducing new methods to improve

the quantization process, we highlight the notations used in XNOR-Net [199] that will be

used in our discussions. For each layer in a CNN, I is the input, W is the weight filter, B is

the binarized weight (+1), and H is the binarized input.

Rastegari et al. [199] propose Binary-Weight-Networks (BWN) and XNOR-Networks.

BWN approximates the weights with binary values, a variation of a BNN. XNOR-Networks

binarize both the weights and activation bits and is considered a 1-bit network. Both net-

works use the idea of a scaling factor. In BWN, the real-valued weight filter W is estimated

using a binary filter B and a scaling factor α. The convolutional operation is then approxi-

mated by:

IW(IB)α,

(1.6)

whereindicates a convolution without multiplication. By introducing the scaling factor,

binary weight filters reduce memory usage by a factor of 32× compared to single precision

filters. To ensure W is approximately equal to αB, BWN defines an optimization problem,

and the optimal solution is:

B= sign(W),

(1.7)

α= W T sign(W)

n

=

 |Wi|

n

= 1

nWrl1.

(1.8)

Therefore, the optimal estimation of a binary weight filter can be achieved by taking

the sign of weight values. The optimal scaling factor is the absolute average of the absolute

weight values. The scaling factor is also used to calculate the gradient in backpropagation.

The core idea of XNOR-Net is the same as BWN, but another scaling factor, β, is used

when binarizing the input I into H. As the experiments show, this approach outperforms

BinaryConnect and BNN by a large margin on ImageNet. Unlike the XNOR-Net, which

sets the mean weights to the scaling factor, Xu et al. [266] define a trainable scaring fac-

tor for both weights and activations. LQ-Nets [284] quantize both weights and activations

with arbitrary bit-widths, including 1-bit. The learnability of the quantizers makes them

compatible with bitwise operations to keep the fast inference merit of properly quantized

neural networks (QNNs).

Based on XNOR-Net [199], the High-Order Residual Quantization (HORQ) [138] pro-

vides a high-order binarization scheme, which achieves a more accurate approximation while

still having the advantage of binary operations. HORQ calculates the residual error and then

performs a new round of thresholding operations to approximate the residual further. This

binary approximation of the residual can be considered a higher-order binary input. Follow-

ing XNOR, HORQ defines the first-order residual tensor R1(x) by computing the difference

between the real input and the first-order binary quantization:

R1(x) = Xβ1H1β2H2,

(1.9)

where R1(x) is a real value tensor. By this analogy, R2(x) can be seen as the second-order

residual tensor, and β3H3 also approximates it. After recursively performing the above

operations, they obtain order-K residual quantization:

X =

K



i=1

βiHi.

(1.10)

During the training of the HORQ network, the input tensor can be reshaped into a

matrix and expressed as any order residual quantization. Experiments show that HORQ-

Net outperforms XNOR-Net in accuracy in the CIFAR dataset.